Cognitive Science
Washington Post urges Congress act to prevent another cover-up of president's health amid Biden revelations
CNN host Jake Tapper told Joe Scarborough during a Wednesday conversation on "Morning Joe" that former President Biden made an effort to convince the MSNBC host that he was fit to run for re-election. The Washington Post editorial board called for more oversight of the Oval Office on Wednesday to ensure a cover-up of the president's health doesn't happen again following revelations in a bombshell book alleging the White House hid former President Joe Biden's decline from the public. "It now seems that, for a considerable time, Biden might have lacked the stamina and cognitive capacity the job demands -- and that his family and closest aides concealed this from the public," the paper's editorial board wrote. "Their apparent decision to put personal loyalties ahead of their duty to the country must be reckoned with. A legal mechanism should be considered to ensure that this doesn't happen again," the board proposed.
EEVR: A Dataset of Paired Physiological Signals and Textual Descriptions for Joint Emotion Representation Learning
EEVR (Emotion Elicitation in Virtual Reality) is a novel dataset specifically designed for language supervision-based pre-training of emotion recognition tasks, such as valence and arousal classification. It features high-quality physiological signals, including electrodermal activity (EDA) and photoplethysmography (PPG), acquired through emotion elicitation via 360-degree virtual reality (VR) videos. Additionally, it includes subject-wise textual descriptions of emotions experienced during each stimulus gathered from qualitative interviews. The dataset consists of recordings from 37 participants and is the first dataset to pair raw text with physiological signals, providing additional contextual information that objective labels cannot offer. To leverage this dataset, we introduced the Contrastive Language Signal Pre-training (CLSP) method, which jointly learns representations using pairs of physiological signals and textual descriptions. Our results show that integrating self-reported textual descriptions with physiological signals significantly improves performance on emotion recognition tasks, such as arousal and valence classification. Moreover, our pre-trained CLSP model demonstrates strong zero-shot transferability to existing datasets, outperforming supervised baseline models, suggesting that the representations learned by our method are more contextualized and generalized. The dataset also includes baseline models for arousal, valence, and emotion classification, as well as code for data cleaning and feature extraction.
MTGS: A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction
Gaze following and social gaze prediction are fundamental tasks providing insights into human communication behaviors, intent, and social interactions. Most previous approaches addressed these tasks separately, either by designing highly specialized social gaze models that do not generalize to other social gaze tasks or by considering social gaze inference as an ad-hoc post-processing of the gaze following task. Furthermore, the vast majority of gaze following approaches have proposed models that can handle only one person at a time and are static, therefore failing to take advantage of social interactions and temporal dynamics. In this paper, we address these limitations and introduce a novel framework to jointly predict the gaze target and social gaze label for all people in the scene. It comprises (i) a temporal, transformer-based architecture that, in addition to frame tokens, handles personspecific tokens capturing the gaze information related to each individual; (ii) a new dataset, VSGaze, built from multiple gaze following and social gaze datasets by extending and validating head detections and tracks, and unifying annotation types. We demonstrate that our model can address and benefit from training on all tasks jointly, achieving state-of-the-art results for multi-person gaze following and social gaze prediction. Our annotations and code will be made publicly available.
existence of multiple representations of the same environment for a few sample neurons, we performed hypothesis tests for multiple
We thank all reviewers for their careful reviews and many positive comments. We feel that the typos and minor issues are easily addressable and will be corrected. We will incorporate this analysis into a revision of the paper. We thank R1 for bringing this highly related work to our attention. That work focuses on environments for which mice have previously developed spatial maps.
Introducing Flow, Googles new AI video tool and Sora competitor
Google's AI Era is officially officially here, and at the center of it is a new generative video model called Flow. At the Google I/O 2025 keynote event on May 20, Google unveiled a new suite of AI video tools, powered by state-of-the-art models. The offspring of media models Veo 3 and Imagen 4, Flow is Google's answer to OpenAI's Sora -- AI tools for a new era in video generation for filmmakers and creatives. However, unlike Sora, Flow comes with native audio generation baked right in. Pitched as an "AI filmmaking tool built for creatives, by creatives," Flow is the tech giant's latest attempt to demo the power of AI as a use case in reshaping the creative process.
Language Models Meet World Models: Embodied Experiences Enhance Language Models
While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities. The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities. Our approach deploys an embodied agent in a world model, particularly a simulator of the physical world (VirtualHome), and acquires a diverse set of embodied experiences through both goal-oriented planning and random exploration. These experiences are then used to finetune LMs to teach diverse abilities of reasoning and acting in the physical world, e.g., planning and completing goals, object permanence and tracking, etc.
EDGI: Equivariant Diffusion for Planning with Embodied Agents
Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. Most algorithms for planning and model-based reinforcement learning (MBRL) do not take this rich geometric structure into account, leading to sample inefficiency and poor generalization. We introduce the Equivariant Diffuser for Generating Interactions (EDGI), an algorithm for MBRL and planning that is equivariant with respect to the product of the spatial symmetry group SE(3), the discrete-time translation group โค, and the object permutation group Sโ. EDGI follows the Diffuser framework by Janner et al. (2022) in treating both learning a world model and planning in it as a conditional generative modeling problem, training a diffusion model on an offline trajectory dataset. We introduce a new SE(3) โค Sโ-equivariant diffusion model that supports multiple representations.
Apple explores letting people control iPhones with their brains, report says
Apple is working on a way for people with physical disabilities to control devices with their thoughts. On Tuesday, the tech giant announced a partnership with brain-computer interface (BCI) company Synchron that's developing an implantable device with electrodes that read brain signals. This technology enables Apple to translate these signals into actions like selecting icons on the screens of iPhones, iPads, and Apple Vision Pro "without the need for physical movement or voice commands," according to the press release. Pittsburgh resident Mark Jackson who has ALS has the Synchron brain implant called Stentrode, which is a "a stent-like device that is implanted in a vein atop the brain's motor cortex." The device "effectively translates brain waves, allowing a user to navigate around a screen and select an icon," the Journal wrote.
Intelligence on Earth Evolved Independently at Least Twice
The original version of this story appeared in Quanta Magazine. Humans tend to put our own intelligence on a pedestal. Our brains can do math, employ logic, explore abstractions, and think critically. But we can't claim a monopoly on thought. Among a variety of nonhuman species known to display intelligent behavior, birds have been shown time and again to have advanced cognitive abilities.